PCI Express to PCI Express based low latency interconnect scheme for clustering systems

ABSTRACT

PCI-Express (PCIE) is a Bus or I/O interconnect standard typically used for communication between a computer or computing system root complex and a plurality of peripheral devices, where the PCIE system typically include a PCIE switch which has only a single inbound PCIE port. An outbound PCIE port on the computing system connects to the single inbound PCIE port of the PCIE switch. The invention discloses a PCIE based interconnect scheme enabling switching and inter-connection between a plurality of PCIE enabled systems, each having a PCIE root complex forming a cluster. The technical advances and scalability of PCIE architecture can be applied to enable data transport between the systems forming the cluster. The PCIE network switch comprise a plurality of inbound ports such that each of the plurality of computing systems connects to an inbound port of the network switch for enabling data transmission and communication within the cluster.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.15/175,800 titled “PCI Express to PCI Express based low latencyinterconnect scheme for clustering systems” filed on Jun. 7, 2016, whichis a continuation of U.S. application Ser. No. 14/588,937 titled “PCIExpress to PCI Express based low latency interconnect scheme forclustering systems” filed on Jan. 3, 2015, currently U.S. Pat. No.9,519,608 which is a continuation of U.S. patent application Ser. No.13/441,883 titled “PCI Express to PCI Express based low latencyinterconnect scheme for clustering systems” filed on Apr. 8, 2012, whichwas abandoned, which is a continuation of U.S. patent application Ser.No. 11/242,463 titled “PCI Express to PCI Express based low latencyinterconnect scheme for clustering systems” filed on Oct. 4, 2005 whichissued as U.S. Pat. No. 8,189,603 on May 29, 2012, all of which have acommon inventor, and are hereby incorporated by reference for all thatthey contain.

TECHNICAL FIELD

The invention generally relates to providing high speed interconnectbetween systems within an interconnected cluster of systems.

BACKGROUND AND PRIOR ART

The need for high speed and low latency cluster interconnect scheme fordata and information transport between systems have been recognized as alimiting factor to achieving high speed operation in clustered systemsand one needing immediate attention to resolve. The growth ofinterconnected and distributed processing schemes have made it essentialthat high speed interconnect schemes be defined and established toprovide the speeds necessary to take advantage of the high speeds beingachieved by data processing systems and enable faster data sharingbetween interconnected systems.

There are today interconnect schemes that allow data transfer at highspeeds, the most common and fast interconnect scheme existing today isthe Ethernet connection allowing transport speeds from 10 MB to as highas 10 GB/sec. TCP/IP protocols used with Ethernet have high over-headwith inherent latency that make it unsuitable for some distributedapplications. Further TCP/IP protocol tends to drop data packets underhigh traffic congestion times, which require resend of the lost packetswhich cause delays in data transfer and is not acceptable for highreliability system operation. Recent developments in optical transportalso provide high speed interconnect capability. Efforts are under wayin different areas of data transport to reduce the latency of theinterconnect as this is a limitation on growth of the distributedcomputing, control and storage systems. All these require either changesin transmission protocols, re-encapsulation of data or modulation ofdata into alternate forms with associated delays increase in latenciesand associated costs.

DESCRIPTION

PCI Express is a Bus or I/O interconnect standard for use inside thecomputer or embedded system enabling faster data transfers to and fromperipheral devices. The standard is still evolving but has achieved adegree of stability such that other applications can be implementedusing PCIE as basis. A PCIE based interconnect scheme to enableswitching and inter-connection between multiple PCIE enabled computingsystems each having its own PCIE root complex, such that the scalabilityof PCIE architecture can be applied to enable data transport betweenconnected systems to form a cluster of systems, is proposed. Theseconnected computing systems can be any computing, control, storage orembedded systems. The scalability of the interconnect will allow thecluster to grow the bandwidth between the computing systems as theybecome necessary without changing to a different connectionarchitecture.

What is Proposed

PCI Express (PCIE) has achieved a prominent place as the I/Ointerconnect standard for use inside computers, processing system andembedded systems that allow serial high speed data transfer to and fromperipheral devices. The typical PCIE provides 2.5-3.8 GB transfer rateper link (this may change as the standard and data rates change). ThePCIE standard is evolving fast, becoming faster and starting become firmand used within more and more systems. Typically each PCIE based systemhas a root complex which controls all connections and data transfers toand from connected peripheral devices through PCIE peripheral end pointsor peripheral modules. What is disclosed is the use of PCIE standardbased peripherals enabled for interconnection to similar PCIE standardbased peripheral connected directly using data links, as an interconnectbetween multiple systems, typically through one or more networkswitches. This interconnect scheme by using PCIE based protocols fordata transfer over direct physical connection links between the PCIEbased peripheral devices, (see FIG. 1), without any intermediateconversion of the transmitted data stream to other data transmissionprotocols or encapsulation of the transmitted data stream within otherdata transmission protocols, thereby reducing the latencies ofcommunication between the connected PCI based systems within thecluster. The PCIE standard based peripheral enabled for interconnectionat a peripheral end point of the system, by directly connecting usingPCIE standard based peripheral to PCIE standard based peripheral directdata link connections to the switch, provides for increase in the numberof links per connection as bandwidth needs of system interconnectionsincrease and thereby allow scaling of the band width available withinany single interconnect or the system of interconnects as required.

Some Advantages of the Proposed Connection Scheme:

1. Reduced Latency of Data transfer as conversion from PCIE to otherprotocols like

Ethernet are avoided during transfer.

2. The number of links per connection can scale from X1 to largernumbers X32 or even X64 as PCIE capabilities increase to cater to theconnection bandwidth needed. Minimum change in interconnect architectureis needed with increased bandwidth, enabling easy scaling with need.

3. Any speed increase in the link connection due to technology advanceis directly applicable to the interconnection scheme.

4. Standardization of the PCIE based peripheral will make componentseasily available from multiple vendors, making the implementation ofinterconnect scheme easier and cheaper.

5. The PCIE based peripheral to PCIE based peripheral links inconnections allow ease of software control and provide reliablebandwidth.

6. The use of standardized PCIE based peripheral modules enabled forinterconnection as out bound port and the use of PCI-Express enabledport on the PCI-Express based network switch for interconnection betweenPCI-Express based network switches will allow for easy expansion of thecluster as computing needs grow.

7. The PCIE links and switches are agnostic to the data transmission andcan be updated with new technology as they become available, to speed updata transfer between clustered PCI-Express enabled computing systems,(also called PCIE computing systems that are computing systems usingPCI-Express bus for peripheral component interconnection, where the PCIEbus is under the control of a root complex of a respective computingsystem) without changing the capabilities and protocols of theinterconnect scheme.

DESCRIPTION OF FIGURES

FIG. 1 Typical Interconnected (multi-system) cluster (shown with eightsystems connected in a star architecture using direct connected datalinks between PCIE standard based peripheral to PCIE standard basedperipheral)

FIG. 2—is a cluster using multiple interconnect modules or switches tointerconnect smaller clusters.

Explanation of Numbering and Lettering in FIG. 1

(1) to (8): Number of Systems interconnected in FIG. 1 (9): Switchsub-system. (10): Software configuration and control input for theswitch. (1 a) to (8 a): PCI Express based peripheral module (PCIEModules) attached to systems. (1 b) to (8 b): PCI Express basedperipheral modules (PCIE Modules) at switch. (1L) to (8L): PCIE basedperipheral module to PCIE based peripheral module connections havingn-links (n-data links)

Explanation of Numbering and Lettering in FIG. 2

(12-1) and (12-2): clusters (9-1) and (9-2): interconnect modules orswitch sub-systems. (10-1) and (10-2): Software configuration inputs(11-1) and (11-2): Switch to switch interconnect module in the cluster(11L): Switch to switch interconnection

DESCRIPTION OF INVENTION

PCI Express is a Bus or I/O interconnect standard for use inside thecomputer or embedded system enabling faster data transfers to and fromperipheral devices. The standard is still evolving but has achieved adegree of stability such that other applications can be implementedusing PCIE as basis. A PCIE based interconnect scheme to enableswitching and inter-connection between multiple PCIE enabled systemseach having its own PCIE root complex, such that the scalability of PCIEarchitecture can be applied to enable data transport between connectedsystems to form a cluster of systems, is proposed. These connectedsystems can be any computing, control, storage or embedded system. Thescalability of the interconnect will allow the cluster to grow thebandwidth between the systems as they become necessary without changingto a different connection architecture.

FIG. 1 is a typical cluster interconnect. The Multi-system cluster shownconsist of eight units or systems {(1) to (8)} that are to beinterconnected. Each system is PCI Express (PCIE) based system with aPCIE root complex for control of data transfer to and from connectedperipheral devices via PCIE peripheral modules as is standard for PCIEbased systems. Each system to be interconnected has at least a PCIEbased peripheral module {(1 a) to (8 a)} as an IO module, at theinterconnect port enabled for system interconnection, with n-links builtinto or attached to the system. (9) is an interconnect module or aswitch sub-system, which has number of PCIE based connection modulesequal to or more than the number of systems to be interconnected, inthis case of FIG. 1 this number being eight {(1 b) to (8 b)}, that canbe interconnected for data transfer through the switch. A software basedcontrol input is provided to configure and/or control the operation ofthe switch and enable connections between the switch ports for transferof data. Link connections {(1L) to (8L)} attach the PCIE basedperipheral modules 1 a to 8 a, enabled for interconnection on therespective systems 1 to 8, to the on the switch with n links. The valueof n can vary depending on the connect band width required by thesystem.

When data has to be transferred between say system 1 and system 5, inthe simple case, the control is used to establish an internal linkbetween PCIE based peripheral modules 1 b and 5 b at the respectiveports of the switch. A hand shake is established between outboundcommunication enabled PCIE based peripheral module (PCIE Module) 1 a andinbound PCIE module 1 b at the switch port and inbound PCIE module 5 aon the switch port and outbound communication enabled PCIE module 5 b.This provides a through connection between the PCIE modules 1 a to 5 bthrough the switch allowing data transfer. Data can then be transferredat speed between the modules and hence between systems. In more complexcases data can also be transferred and queued in storage implemented inthe switch, at the ports and then when links are free transferred out tothe right systems at speed.

Multiple systems can be interconnected at one time to form amulti-system that allow data and information transfer and sharingthrough the switch. It is also possible to connect smaller clusterstogether to take advantage of the growth in system volume by using anavailable connection scheme that interconnects the switches that form anode of the cluster.

If need for higher bandwidth and low latency data transfers betweensystems increase, the connections can grow by increasing the number oflinks connecting the PCIE modules between the systems in the cluster andthe switch without completely changing the architecture of theinterconnect. This scalability is of great importance in retainingflexibility for growth and scaling of the cluster.

It should be understood that the system may consist of peripheraldevices, storage devices and processors and any other communicationdevices. The interconnect is agnostic to the type of device as long asthey have a PCIE module at the port to enable the connection to theswitch. This feature will reduce the cost of expanding the system bychanging the switch interconnect density alone for growth of themulti-system.

PCIE is currently being standardized and that will enable the use of theexisting PCIE modules to be used from different vendors to reduce theover all cost of the system. In addition using a standardized module inthe system as well as the switch will allow the cost of softwaredevelopment to be reduced and in the long run use available software toconfigure and run the systems.

As the expansion of the cluster in terms of number of systems,connected, bandwidth usage and control will all be cost effective, it isexpected the overall system cost can be reduced and overall performanceimproved by standardized PCIE module use with standardized softwarecontrol.

Typical connect operation may be explained with reference to two of thesystems, example system (1) and system (5). System (1) has a PCIE module(1 a) at the interconnect port and that is connected by the connectionlink or data-link or link (1L) to a PCIE module (1 b) at the IO port ofthe switch (9). System (5) is similarly connected to the switch troughthe PCIE module (5 a) at its interconnect port to the PCIE module (5 b)at the switch (9) IO port by link (5L). Each PCIE module operates fortransfer of data to and from it by standard PCI Express protocols,provided by the configuration software loaded into the PCIE modules andswitch. The switch operates by the software control and configurationloaded in through the software configuration input.

FIG. 2 is that of a multi-switch cluster. As the need tom interconnectlarger number of systems increase, it will be optimum to interconnectmultiple switches of the clusters to form a new larger cluster. Such aconnection is shown in FIG. 2. The shown connection is for two smallerclusters (12-1 and 12-2) interconnected using PCIE modules that can beconnected together using any low latency switch to switch connection(11-10 and 11-2), connected using interconnect links (11L) to providesufficient band width for the connection. The switch to switchconnection transmits and receives data and information using anysuitable protocol and the switches provide the interconnectioninternally through the software configuration loaded into them.

The following are some of the advantages of the disclosed interconnectscheme 1. Provide a low latency interconnect for the cluster. 2. Use ofPCI Express based protocols for data and information transfer within thecluster. 3. Ease of growth in bandwidth as the system requirementsincrease by increasing the number of links within the cluster. 4.Standardized PCIE component use in the cluster reduce initial cost. 5.Lower cost of growth due to standardization of hardware and software. 6.Path of expansion from a small cluster to larger clusters as need grows.7. Future proofed system architecture. 8. Any speed increase in theswitch and link connections due to technology advance is directlyapplicable to the interconnection scheme.

The circuit implementations can be any or a combination ofIntegrated-circuit, FPGA, Silicon on Chip (SOC), chip on board (COB),optical, or hybrid circuit implementations. The disclosed interconnectscheme provides advantages for low latency multi-system cluster growththat are not available from any other source.

While the invention has been described in terms of several embodiments,those of ordinary skill in the art will recognize that the invention isnot limited to the embodiments described, but can be practiced withmodification and alteration within the spirit and scope of the appendedclaims. Multiple existing methods and methods developed using newlydeveloped technology may be used to establish the hand shake betweensystems and to improve data transfer and latency. The description isthus to be regarded as illustrative instead of limiting and capable ofusing any new technology developments in the field of communication andata transfer. There are numerous other variations to different aspectsof the invention described above, which in the interest of concisenesshave not been provided in detail. Accordingly, other embodiments arelimited only within the scope of the claims.

1. A PCI-Express switch device for transferring information between atleast first and second PCI-Express enabled computing systems, whereineach of the PCI-Express enabled computing systems comprise a PCI-Expressbus system that is under control of a PCI-Express root complex, thePCI-Express switch device comprising: a plurality of inbound PCI-Expressports, wherein at least first and second inbound PCI-Express ports ofthe plurality of inbound PCI-Express ports are configured for sendingand receiving data and network packets in communication with an outboundPCI-Express port on each of the first and second PCI-Express enabledcomputing systems respectively.
 2. The PCI-Express switch device ofclaim 1, further comprising a first PCI-Express expansion port, whereinthe first PCI-Express expansion port utilizes PCI-Express protocol andis enabled to connect to a second PCI-Express expansion port on a secondPCI-Express switch device.
 3. The PCI-Express switch device of claim 1,wherein the switch device further comprises one or more outboundPCI-Express ports for communicating with PCI-Express peripheral devices.4. The PCI-Express switch device of claim 1, wherein the PCI-Expressswitch device is configured to transfer data between the PCI-Expressenabled computing systems using PCI-Express protocol.
 5. The PCI-Expressswitch device of claim 1, wherein the PCI-Express switch devicecomprises any of a silicon device or a circuit module.
 6. A PCI-Expressswitch device for transferring information between at least first andsecond PCI-Express enabled computing systems, wherein each of thePCI-Express enabled computing systems comprises a PCI-Express bus systemthat is under control of a PCI-Express root complex, the PCI-Expressswitch device comprising: at least first and second inbound PCI-Expressports wherein the inbound PCI-Express ports are connectable to at leastthe first and the second PCI-Express enabled computing systemsrespectively, wherein the PCI-Express switch comprises a plurality ofinbound PCI-Express ports; wherein the PCI-Express switch receives viaat least the first inbound PCI-Express port of the PCI-Express switch, afirst data packet transmitted from an outbound PCI-Express port on thefirst PCI-Express enabled computing system; and wherein the PCI-Expressswitch transmits the first data packet from the second inboundPCI-Express port of the PCI-Express switch, the first data packetsubsequently being receivable by an outbound PCI-Express port on thesecond PCI-Express enabled computing system.
 7. The PCI-Express switchdevice of claim 6, further comprising a first PCI-Express expansionport, wherein the first PCI-Express expansion port utilizes PCI-Expressprotocol and is enabled to connect to a second PCI-Express expansionport on a second PCI-Express switch device.
 8. The PCI-Express switchdevice of claim 7, wherein the PCI-Express switch device is configuredto transfer data between the PCI-Express enabled computing systems usingPCI-Express protocol.
 9. The PCI-Express switch device of claim 6,wherein the PCI-Express switch device comprises a plurality of inboundPCI-Express ports.
 10. The PCI-Express switch device of claim 6, whereinthe PCI-Express switch device comprises any of a silicon device or acircuit module.
 11. A PCI-Express switch device for transferring databetween at least first and second PCI-Express enabled computing systems,the PCI-Express switch device comprising: a plurality of inboundPCI-Express ports that enable transferring data from at least the firstPCI-Express enabled computing system to at least the second PCI-Expressenabled computing system using PCI-Express protocol, wherein each of thePCI-Express enabled computing systems comprise a PCI-Express bus undercontrol of a root complex, wherein: a) a first inbound PCI-Express porton the PCI-Express switch device is configurable to receive a first datapacket from an outbound PCI-Express port on the first PCI-Expressenabled computing system ,; b) the PCI-Express switch device transfersthe first data packet from the first inbound PCI-Express port on thePCI-Express switch to a second inbound PCI-Express port on thePCI-Express switch; and c) the second inbound PCI-Express port on thePCI-Express switch device is configurable to transfer the first datapacket from the second inbound PCI-Express port on the PCI-Expressswitch device to an outbound PCI-Express port on the second PCI-Expressenabled computing system.
 12. The PCI-Express switch device of claim 11,further enabling transfer of a second data packet from at least thesecond PCI-Express enabled computing system to at least the firstPCI-Express enabled computing system using PCI-Express protocol,wherein: d) the PCI-Express switch device is enabled to receive thesecond data packet from the outbound PCI-Express port on the secondPCI-Express enabled computing system; e) the PCI-Express switch devicetransfers the second data packet from the second inbound PCI-Expressport on the PCI-Express switch device to the first inbound PCI-Expressport on the PCI-Express switch; and f) PCI-Express switch device isenabled to transfer the second data packet from the first inboundPCI-Express inbound port on the PCI-Express switch to the outboundPCI-Express port on the first computing system.
 13. The PCI-Expressswitch device of claim 12, further comprising a first PCI-Expressexpansion port, wherein the first PCI-Express expansion port utilizesPCI-Express protocol and is enabled to connect to a second PCI-Expressexpansion port on a second PCI-Express switch device.
 14. ThePCI-Express switch device of claim 13 wherein a data packet received bythe PCI-Express switch from either the first or second PCI-Expressenabled computing systems, is transmitted via the first PCI-Expressexpansion port to the second PCI-Express expansion port on the secondPCI-Express switch device, whereby the second PCI-Express switch devicetransmits the data packet via an inbound PCI-Express port on the secondPCI-Express switch device to an outbound PCI-Express port on a thirdPCI-Express enabled computing system.
 15. The PCI-Express switch deviceof claim 11, wherein the PCI-Express switch device comprises one or moreoutbound PCI-Express ports for communicating with PCI-Express peripheraldevices.
 16. The PCI-Express switch device of claim 12, wherein datatransfer within the PCI-Express enabled computing systems over the PCIExpress bus that is connectable to the PCI-Express switch device, isunder control of the root complex.
 17. The PCI-Express switch device ofclaim 12, wherein the PCI-Express switch device is configured totransfer data between the PCI-Express enabled computing systems usingPCI-Express protocol.
 18. The PCI-Express switch device of claim 11,wherein the PCI-Express switch device comprises any of a silicon deviceor a circuit module.